Literature Data Mining for Biology
نویسندگان
چکیده
In this introduction, we summarize the papers included in the session on Literature Data Mining for Biology. We then discuss the need for a challenge evaluation for this eld, and the steps to create such an evaluation. These include creating a shared infrastructure, providing annotated data, and deening and implementing common evaluation metrics. This would enable researchers to compare diiering methods, in order to accelerate progress in this eld. In this context, we describe two speciic applications: extraction of biological pathways from the literature and automated database curation. For each of these, we outline the task deenition, the creation of an annotated corpus, and evaluation metrics. Even though the number and the size of sequence databases are growing rapidly, most new information relevant to biology research is still recorded as free text in journal articles and in comment elds of databases like the Gen-Bank feature table annotations. As biomedical research enters the post-genome era, new kinds of databases that contain information beyond simple sequences are needed, for example, information on cellular localization, protein-protein interactions, gene regulation and the context of these interactions. The forerunners of such databases include KEGG 1 , DIP 2 , BIND 3 , among others. Such databases are still small in size and are largely hand curated. A factor that can accelerate their growth is the development of reliable literature data mining technologies. This year is the third time the Paciic Symposium on Biocomputing has devoted an entire session to natural language processing and information extraction for biology. Compared to the last two years, the eld has made tremendous
منابع مشابه
Credit scoring in banks and financial institutions via data mining techniques: A literature review
This paper presents a comprehensive review of the works done, during the 2000–2012, in the application of data mining techniques in Credit scoring. Yet there isn’t any literature in the field of data mining applications in credit scoring. Using a novel research approach, this paper investigates academic and systematic literature review and includes all of the journals in the Science direct onli...
متن کاملAccomplishments and challenges in literature data mining for biology
We review recent results in literature data mining for biology and discuss the need and the steps for a challenge evaluation for this field. Literature data mining has progressed from simple recognition of terms to extraction of interaction relationships from complex sentences, and has broadened from recognition of protein interactions to a range of problems such as improving homology search, i...
متن کاملMining literature for systems biology
Currently, literature is integrated in systems biology studies in three ways. Hand-curated pathways have been sufficient for assembling models in numerous studies. Second, literature is frequently accessed in a derived form, such as the concepts represented by the Medical Subject Headings (MeSH) and Gene Ontologies (GO), or functional relationships captured in protein-protein interaction (PPI) ...
متن کاملText Mining of Biomedical Literature Repositories
There is an increasing interest in the development of biomedical text mining applications not only to enable improved literature search, but also to automatically detect pointers between biologically relevant entities described in articles and their corresponding records in existing annotation databases. The rapid growth of natural language data in biomedical sciences (including scientific arti...
متن کاملa swift heuristic algorithm base on data mining approach for the Periodic Vehicle Routing Problem: data mining approach
periodic vehicle routing problem focuses on establishing a plan of visits to clients over a given time horizon so as to satisfy some service level while optimizing the routes used in each time period. This paper presents a new effective heuristic algorithm based on data mining tools for periodic vehicle routing problem (PVRP). The related results of proposed algorithm are compared with the resu...
متن کاملGetting to the (c)ore of knowledge: mining biomedical literature
Literature mining is the process of extracting and combining facts from scientific publications. In recent years, many computer programs have been designed to extract various molecular biology findings from Medline abstracts or full-text articles. The present article describes the range of text mining techniques that have been applied to scientific documents. It divides 'automated reading' into...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001